Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make it possible to allow discovery errors for controllers #49495

Merged

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented Jul 24, 2017

Update the discovery client to return partial discovery information and an error. Since we can aggregate API servers, discovery of some resources can fail independently. Callers of this function who want to tolerate the errors can, existing callers will still get an error and fail in normal blocks.

@kubernetes/sig-api-machinery-misc @sttts

Make it possible to allow discovery errors for controllers

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 24, 2017
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Jul 24, 2017
@deads2k
Copy link
Contributor Author

deads2k commented Jul 24, 2017

/retest

@luxas luxas removed their assignment Jul 24, 2017
@@ -366,7 +367,10 @@ func GetAvailableResources(clientBuilder controller.ControllerClientBuilder) (ma

resourceMap, err := discoveryClient.ServerResources()
if err != nil {
return nil, fmt.Errorf("failed to get supported resources from server: %v", err)
utilruntime.HandleError(fmt.Errorf("unable to get all supported resources from server: %v", err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HandleError only logs here? Does this fix the issue where if I have two versions of a TPR, the cm crash loops?

cc @caesarxuchao

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HandleError only logs here?

The choice is up to the admin I suppose, there's an env var to force it either way. Returning an error unconditionally fails, so this at least gives them the choice.

Does this fix the issue where if I have two versions of a TPR, the cm crash loops?

I'm not familiar with this. Got an issue handy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that we continue, go into the if clause below and return an error. What have we won with this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that we continue, go into the if clause below and return an error. What have we won with this change?

it can return an error and a the results it was able to get. Consider the aggregated case with one server of ten being down.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TPR issue @mikedanese mentioned is #22768 (comment). If user created multiple versions of a TPR, only the first version is operational, so the discovery here fails and causes controller manager to crashloop.

@jbeda jbeda removed their assignment Jul 25, 2017
return nil, fmt.Errorf("failed to get supported resources from server: %v", err)
utilruntime.HandleError(fmt.Errorf("unable to get all supported resources from server: %v", err))
}
if resourceMap == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange interface which returns a nil map without an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least compare the length to be a bit more complete. Also below...

@@ -187,7 +187,7 @@ func (d *DiscoveryClient) ServerResourcesForGroupVersion(groupVersion string) (r
}

// serverResources returns the supported resources for all groups and versions.
func (d *DiscoveryClient) serverResources(failEarly bool) ([]*metav1.APIResourceList, error) {
func (d *DiscoveryClient) serverResources(_ bool) ([]*metav1.APIResourceList, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's get rid of the failEarly argument.

@deads2k
Copy link
Contributor Author

deads2k commented Jul 26, 2017

comments addressed.

@caesarxuchao
Copy link
Member

/retest

@caesarxuchao
Copy link
Member

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Jul 26, 2017
@caesarxuchao
Copy link
Member

Can we cherrypick it to 1.7? Without it, controller manager will crashloop if user registers multiple versions for a TPR (#22768 (comment)). @deads2k @wojtek-t

@deads2k
Copy link
Contributor Author

deads2k commented Jul 27, 2017

/retest

1 similar comment
@caesarxuchao
Copy link
Member

/retest

@caesarxuchao
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 27, 2017
@sttts
Copy link
Contributor

sttts commented Jul 27, 2017

/approve no-issue

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: caesarxuchao, deads2k, sttts

Associated issue requirement bypassed by: sttts

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 27, 2017
@fejta-bot
Copy link

/retest
Automated flake /retester experiment. Please send feedback to fejta

@deads2k
Copy link
Contributor Author

deads2k commented Jul 28, 2017

/retest
Automated flake /retester experiment. Please send feedback to fejta

I'm torn. Are the failures being tracked?

/retest

@fejta
Copy link
Contributor

fejta commented Jul 28, 2017

Only in aggregate. We're trying to bring better visibility.

@fejta-bot
Copy link

/retest
Automatic flake /retester. Please send feedback to @fejta

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 49665, 49689, 49495, 49146, 48934)

@wojtek-t
Copy link
Member

Cherrypick approved - automated cherrypick in #49767

@wojtek-t wojtek-t added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jul 28, 2017
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

@deads2k deads2k deleted the controller-12-toleration branch August 3, 2017 20:14
irfanurrehman added a commit to irfanurrehman/kubernetes that referenced this pull request Aug 16, 2017
The federated unit tests, the way are done have been udpdated in master and are
no longer the preferred way of doing this test. They however are still invoked on
1.7. This is to enable the cherry pick of kubernetes#49495
on 1.7.
k8s-github-robot pushed a commit that referenced this pull request Aug 18, 2017
…95-upstream-release-1.7

Automatic merge from submit-queue

Automated cherry pick of #49495 upstream release 1.7

Cherry pick of #49495 on release-1.7.

#49495: make it possible to allow discovery errors for controllers
@foxish
Copy link
Contributor

foxish commented Aug 18, 2017

@deads2k @sttts Can this affect 1.6 as well?

@deads2k
Copy link
Contributor Author

deads2k commented Aug 18, 2017

@deads2k @sttts Can this affect 1.6 as well?

I don't know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet